JAMA Network Open — Latest Matching Preprints

1

Dynamic and Baseline Multi-Task Learning for Predicting Substance Use Initiation in the ABCD Study

Wei, M.; Zhang, H.; Peng, Q.

2026-04-13 addiction medicine 10.64898/2026.04.10.26350655 medRxiv

Top 0.1%

22.7%

Show abstract

Background: Early initiation of substance use is linked to later adverse outcomes, and risk factors come from multiple domains and are shared across substances. In our previous work, traditional time-to-event Cox models identified individual risk factors, but these models are not designed to jointly model multiple outcomes or capture complex non-linear relationships. Multi-task learning (MTL) can leverage shared structure across related outcomes to improve prediction and distinguish common versus substance-specific predictors. However, most MTL studies rely on baseline features and focus on single outcomes, which limits their ability to capture shared risk and temporal changes. Substance use initiation is a time-dependent process that unfolds during development and reflects changing exposures over time. Baseline-only models cannot capture these changes or represent risk dynamics. Discrete-time modeling provides a practical approach by estimating interval-level initiation risk and combining it into cumulative risk at the subject level. By integrating multi-task learning with dynamic modeling, it is possible to share information across outcomes while capturing how risk evolves over time, which may improve prediction performance. Methods: Using the Adolescent Brain Cognitive Development (ABCD) Study (release 5.1), we developed two complementary multi-task learning (MTL) frameworks to predict initiation of alcohol, nicotine, cannabis, and any substance use. A baseline MTL model predicted fixed- horizon (48-month) initiation using one record per participant, while a dynamic discrete-time MTL model incorporated longitudinal interval data to model time-varying risk. Both models used multi-domain environmental exposures, core covariates, and polygenic risk scores (PRS). Performance was evaluated on a held-out test set using AUROC, PR-AUC, and calibration metrics, and compared with single-task logistic regression (LR). Feature importance was assessed using permutation importance and compared with Cox proportional hazards models. Results: MTL showed comparable or improved performance relative to LR, with larger gains for low-prevalence outcomes (cannabis and nicotine). Incorporating longitudinal information led to consistent improvements across all outcomes. Dynamic models increased AUROC by +0.044 to +0.062 for MTL and +0.050 to +0.084 for LR, indicating that temporal information was the primary driver of performance gains. Feature importance analyses showed modest overlap across methods, with higher agreement between dynamic MTL and Cox models than static MTL. A small set of features, including externalizing behavior, parental monitoring, and developmental factors, were consistently identified across all approaches. Conclusions: Dynamic multi-task learning improves the prediction of substance use initiation by leveraging longitudinal structure and shared information across outcomes. While MTL provides additional gains, incorporating time-varying information is the dominant factor for improving performance. Combining baseline and dynamic frameworks offers a comprehensive strategy for identifying robust risk factors and modeling adolescent substance use initiation.

2

Racial and Ethnic Differences in Cesarean Delivery Across Insurance Types, United States, 2014-2024

Akinyemi, O.; Fasokun, M.; Singleton, D.; Ogunyankin, F.; Khalil, S.; Gordon, K.; Michael, M.; Hughes, K.; Luo, G.; Lawson, S.; Ahizechukwu, E.

2026-04-06 obstetrics and gynecology 10.64898/2026.04.04.26350151 medRxiv

Top 0.1%

18.8%

Show abstract

Introduction Cesarean delivery accounts for nearly one-third of U.S. births and is associated with substantial maternal morbidity and health care costs. Persistent racial disparities have been documented, yet the structural factors contributing to these differences remain incompletely understood. The extent to which insurance coverage shapes racial disparities in cesarean delivery remains unclear. Objective To evaluate the independent and interactive associations of race/ethnicity and insurance coverage with cesarean delivery in the United States. Methods Population-based retrospective cohort study using singleton live births recorded in the United States Vital Statistics Natality files from 2014 to 2024. Multivariable logistic regression was used to estimate the independent effects of race/ethnicity and insurance status on cesarean delivery, including interaction terms to test effect modification, using national birth certificate data. Models were adjusted for maternal demographics, clinical factors, and temporal covariates. Adjusted odds ratios, predicted probabilities, and absolute risk differences were derived from post-estimation marginal effects. The main outcome measure was cesarean delivery (yes vs no). Results Among 41,543,568 deliveries from 2014 to 2024, 13,312,221 (32.0%) were cesarean deliveries. After adjustment, both race and ethnicity and insurance status were independently associated with cesarean delivery. Compared with non-Hispanic White women, non-Hispanic Black women had higher odds of cesarean delivery (odds ratio [OR], 1.22; 95% CI, 1.22-1.23). Relative to uninsured women, those with private insurance had 59% higher odds of cesarean delivery (OR, 1.59; 95% CI, 1.58-1.60). Significant interaction effects were observed, indicating that insurance coverage modified racial and ethnic differences in cesarean delivery. Non-Hispanic Black women had the highest predicted probabilities across all insurance categories, with the largest absolute disparities observed among uninsured women. Conclusion Racial and ethnic differences in cesarean delivery persist in the United States and are modified by insurance coverage, suggesting that coverage-related differences may contribute to inequities in obstetric care.

3

Evaluating Large Language Models for Transparent Quality-of-Care Measurement in Children with ADHD

Bannett, Y.; Pillai, M.; Huang, T.; Luo, I.; Gunturkun, F.; Hernandez-Boussard, T.

2026-04-17 pediatrics 10.64898/2026.04.12.26350732 medRxiv

Top 0.1%

18.5%

Show abstract

ImportanceGuideline-concordant care for young children with attention-deficit/hyperactivity disorder (ADHD) includes recommending parent training in behavior management (PTBM) as first-line treatment. However, assessing guideline adherence through manual chart review is time-consuming and costly, limiting scalable and timely quality-of-care measurement. ObjectiveTo evaluate the accuracy and explainability of large language models (LLMs) in identifying PTBM recommendations in pediatric electronic health record (EHR) notes as a scalable alternative to manual chart review. Design, Setting, and ParticipantsThis retrospective cohort study was conducted in a community-based pediatric healthcare network in California consisting of 27 primary care clinics. The study cohort included children aged 4-6 years with [≥] 2 primary care visits between 2020-2024 and ICD-10 diagnoses of ADHD or ADHD symptoms (n=542 patients). Clinical notes from the first ADHD-related visit were included. A stratified subset of 122 notes, including all cases with model disagreement, was manually annotated to assess model performance in identifying PTBM recommendations and rank model explanations. ExposuresAssessment and plan sections of clinical notes were analyzed using three generative large language models (Claude-3.5, GPT-4o, and LLaMA-3.3-70B) to identify the presence of PTBM recommendations and generate explanatory rationales and documentation evidence. Main Outcomes and MeasuresModel performance in identifying PTBM recommendations (measured by sensitivity, positive predictive value (PPV), and F1-score) and qualitative explainability ratings of model-generated rationales (based on the QUEST framework). ResultsAll three models demonstrated high performance compared to expert chart review. Claude-3.5 showed balanced performance (sensitivity=0.89, PPV=0.95, and F1-score=0.92) and ranked highest in explainability. LLaMA3.3-70B achieved sensitivity=0.91, PPV=0.89, and F1-score=0.90, ranking second for explainability. GPT-4o had the highest PPV [0.97] but lowest sensitivity [0.82], with an F1-score of 0.89 and the lowest explainability ranking. Based on classifications from the best-performing model, Claude-3.5, 26.4% (143/542) of patients had documented PTBM recommendations at their first ADHD-related visit. Conclusions and RelevanceLLMs can accurately extract guideline-concordant clinician recommendations for non-pharmacological ADHD treatment from unstructured clinical notes while providing clear explanations and supporting evidence. Evaluating model explainability as part of LLM implementation for medical chart review tasks can promote transparent and scalable solutions for quality-of-care measurement.

4

Declining Pediatric Representation in NIH Artificial Intelligence and Machine Learning Funding, 2020-2024

Phillips, V.; Woodwal, P.

2026-04-11 health policy 10.64898/2026.04.08.26350420 medRxiv

Top 0.1%

17.7%

Show abstract

BackgroundArtificial intelligence and machine learning (AI/ML) are among the fastest-growing domains in NIH research funding, but whether children have shared equitably in this expansion is unknown. We characterized pediatric representation in NIH AI/ML funding from fiscal years (FY) 2020 to 2024. MethodsNIH grant data were obtained from Research Portfolio Online Reporting Tools Expenditures and Results bulk files for FY2020 to FY2024. AI/ML grants were identified using the NIH Research, Condition, and Disease Categorization "Machine Learning and Artificial Intelligence" category, and pediatric grants using the "Pediatric" category. Subprojects were excluded. Grants were deduplicated within each fiscal year by core project number for trend analyses and across all years retaining the most recent fiscal year for cross-sectional totals. Disease areas were identified by keyword searches of titles and abstracts. ResultsAcross FY2020 to FY2024, 5,624 unique NIH AI/ML grants totaling $3,371 million were identified. Of these, 836 grants (14.9%) were classified as pediatric, representing $401 million (11.9%) of total NIH AI/ML funding. Although this share was consistent with the historically reported overall NIH pediatric funding baseline of approximately 10% to 12%, it remained substantially below the US pediatric population share of approximately 22%. The pediatric share of NIH AI/ML funding declined from 12.3% in FY2020 to 10.8% in FY2024, despite growth in absolute pediatric funding. Indexed to FY2020, pediatric AI/ML funding grew approximately 2.6-fold compared with 3.0-fold growth in the total portfolio. Across disease areas, unadjusted adult/general-to-pediatric funding ratios ranged from 2.0-fold in mental health to 9.8-fold in cancer. ConclusionsPediatric representation in NIH AI/ML funding remained low and declined over time as the overall portfolio expanded. These findings suggest that growth in NIH AI/ML investment has not been matched by proportional gains for pediatric research.

5

Assessing the Impact of Timing and Coverage of United States COVID-19 Vaccination Campaigns: A Multi-Model Approach

Nande, A.; Larsen, S. L.; Turtle, J.; Davis, J. T.; Bandekar, S. R.; Lewis, B.; Chen, S.; Contamin, L.; Jung, S.-m.; Howerton, E.; Shea, K.; Bay, C.; Ben-Nun, M.; Bi, K.; Bouchnita, A.; Chen, J.; Chinazzi, M.; Fox, S. J.; Hill, A. L.; Hochheiser, H.; Lemaitre, J. C.; Loo, S. L.; Marathe, M.; Meyers, L. A.; Pearson, C. A. B.; Porebski, P.; Przykucki, E.; Smith, C. P.; Venkatramanan, S.; Vespignani, A.; Willard, T. C.; Yan, K.; Viboud, C.; Lessler, J.; Truelove, S.

2026-04-08 public and global health 10.64898/2026.04.07.26349269 medRxiv

Top 0.1%

17.5%

Show abstract

Background Six years after its emergence, SARS-CoV-2 continues to have a substantial burden. The impact of vaccination and the optimal timing of its rollout remain uncertain given existing population immunity and variability in outbreak timing between summer and winter. Methods The US Scenario Modeling Hub convened its 19th round of ensemble projections for COVID-19 hospitalizations and deaths in the United States, where eight teams projected trajectories in each US state and nationally from April 2025 to April 2026 under five scenarios regarding vaccine recommendations and timing. Recommendations had two eligibility scenarios (high-risk individuals only and all-eligible) and two timing scenarios (classic start: mid-August, earlier start: late June). These were crossed to create four scenarios and were compared against a counterfactual scenario with no vaccination. Findings Compared to no vaccination, our ensemble projections estimated 90,000 (95% PI 53,000-126,000) hospitalizations averted in the high-risk and classic timing scenario across the US. Expanding to all-eligible age-groups averted an additional 26,000 (95% PI 14,000-39,000) hospitalizations, which when coupled with the early vaccination timing, was projected to further reduce national hospitalizations by 15,000 (95% PI -3,000-33,000). The majority of teams projected both summer and winter waves. Implications We project COVID-19 will cause significant hospitalizations and deaths in the US in the 2025-26 season and estimate significant benefits from a broad all-eligible vaccination recommendation. The results also suggest an additional benefit is likely to be gained from an earlier vaccination campaign. Funding Centers for Disease Control and Prevention; National Institute of Health (US), National Science Foundation (US)

6

Language-Related Differences in Prenatal Depression Screening Uptake, US Midwest 2019-2024

Luff, A.; Rivelli, A.; Akaninyene, N.; Malloy, E.; Mishra, R.; Fitzpatrick, V.

2026-04-08 obstetrics and gynecology 10.64898/2026.04.07.26350332 medRxiv

Top 0.1%

15.1%

Show abstract

Prenatal depression is a substantial contributor to maternal morbidity, and screening is an entry point to psychiatric assessment and treatment during pregnancy. Following updated guidelines and quality metrics for prenatal depression screening, we evaluated whether screening uptake differed by preferred language within a large U.S. healthcare system. We used electronic health record data to identify a retrospective cohort of deliveries at or beyond 20 weeks gestation in 2019-2024. We used logistic regression with a language-year interaction to estimate the adjusted marginal probabilities of screening by language preference. Among 99,526 pregnancies (82,632 individuals), screening increased substantially over time but increases differed across language groups (p<0.001). In 2019, screening probabilities were similar (English 0.50; Spanish 0.48; Another Language 0.50). By 2024, probabilities diverged (English 0.81; Spanish 0.66; Another Language 0.71). Unequal screening uptake can systematically under-identify prenatal depression among patients with non-English language preference, with implications for equitable access to psychiatric care.

7

National Validation of Risk Stratified Delivery Timing for NTSV Cesarean Reduction: A Population Based Analysis of 5.8 Million Deliveries

Crabtree, L.; Gheorghe, C. P.

2026-04-01 obstetrics and gynecology 10.64898/2026.03.31.26349855 medRxiv

Top 0.1%

14.8%

Show abstract

Objective: To externally validate a risk stratified delivery timing model for nulliparous, term, singleton, vertex (NTSV) cesarean reduction using national data. Design: Population based cohort study of NTSV births in US National Vital Statistics System (NVSS) natality files, 2020 to2024, using logistic regression for cesarean predictors and risk stratified Monte Carlo simulation (10,000 iterations per strategy and risk group) to evaluate delivery timing policies. Setting: All live births in the US recorded in the NVSS natality files. Participants: NTSV patients with term (37+ weeks) pregnancies and complete gestational age and delivery mode data (N=5 776 412). A sensitivity cohort excluded pre 39 week deliveries and pregnancies with preexisting diabetes or hypertension. Exposures: Delivery timing strategies defined by gestational age and labor onset (elective induction at 39, 40, or 41 weeks, or expectant management to 42 weeks), evaluated within maternal age and body mass index (BMI) risk strata (low: age <35 and BMI <30; moderate: age > 35 or BMI > 30; high: age > 35 and BMI > 35). Main Outcomes and Measures: Primary outcome was cesarean delivery, measured as the proportion of deliveries completed by cesarean across gestational ages, labor onset types, and age BMI strata. Secondary outcomes included gestational age specific cesarean rates, area under the receiver operating characteristic curve (AUC) for cesarean prediction, and simulated mean cesarean rates with 95% simulation intervals under four delivery timing strategies within each risk group. Results: The overall NTSV cesarean rate was 26.4%. Cesarean Rates were U shaped across gestational ages, with the lowest rate at 38 weeks (24.9%) and higher rates at 37 weeks (29.8%) and 41 to 42 weeks (28.1 to 28.5%). Risk group distribution was 64.9% low, 33.7% moderate, and 1.4% high. Model AUC was 0.65. Induction had higher cesarean rates than spontaneous labor (29.3% vs 24.2%; odds ratio 1.30, 95% confidence interval 1.29 to 1.30). Monte Carlo simulation favored induction at 39 weeks for high risk patients (59.3%) and expectant management to 41 to 42 weeks for low risk patients (19.1%). Conclusions and Relevance: A risk stratified NTSV labor management model showed external validity in 5.8 million US births and consistently identified risk-specific timing strategies that lowered cesarean rates, supporting individualized delivery timing policies.

8

Grading of Erythema and Visual Attributes in Atopic Dermatitis across Diverse Skin Tones Using a Vision AI Pipeline

Abdolahnejad, M.; Kyremeh, M.; Smith, J.; Fang, G.; Chan, H. O.; Joshi, R.; Hong, C.

2026-03-31 dermatology 10.64898/2026.03.30.26349755 medRxiv

Top 0.1%

14.7%

Show abstract

Background: Atopic dermatitis (AD) is a prevalent chronic inflammatory skin disease associated with clinical, psychosocial, and economic burden. Accurate severity assessment is essential for guiding treatment escalation and monitoring disease activity, yet clinician-based scoring systems such as the Eczema Area and Severity Index (EASI) are limited by subjectivity and considerable inter- and intra-rater variability. Erythema, a key driver of AD severity grading, is particularly prone to inconsistent evaluation due to differences in ambient lighting, device quality, skin tone, and rater experience, underscoring the need for objective, reproducible assessment tools. Objective: To develop and validate an artificial intelligence (AI) pipeline for grading erythema, excoriation, and lichenification severity in AD from clinical photographs. The study evaluated the level of agreement between AI severity ratings in each category against dermatologists, non-specialists, and a consensus reference standard, with erythema as the primary outcome of interest. Methods: A two-stage AI pipeline was developed using EfficientNet B7 convolutional neural networks (CNNs). The first CNN was trained as a binary AD classifier on 451 AD and 601 non-AD images for lesion detection and segmentation. The second CNN was trained on 173 dermatologist-annotated AD images which were scored on a 0-3 ordinal scale for erythema, excoriation, and lichenification. This CNN had a downstream feature extraction algorithms such red channel contrast for erythema, Law's E5L5 for excoriation, and S5L5 texture maps for lichenification. In a cross-sectional validation study, 41 independent test images were scored by two blinded dermatologists and two blinded physicians. AI predictions were compared to individual rater groups and mode-derived consensus scores using weighted Cohen's kappa, classification accuracy, confusion matrices, and error direction analyses. Results: On internal validation, the severity CNN achieved 84% overall accuracy (averaged across all three attributes), 86% sensitivity, 87% specificity, and a macro-averaged area under the receiver operating characteristic curve (AUC) of 0.90. In the external comparison with blinded human raters, erythema agreement between the AI and dermatologist consensus was substantial (accuracy 80.7%; kappa = 0.68), with no large (>2-point) misclassifications. Physician consensus agreement was lower (accuracy 54.8%; kappa = 0.34), reflecting greater variability among primary care physicians (non-specialists). For excoriation, AI-dermatologist agreement was moderate (accuracy 72.4%; kappa = 0.62); for lichenification, agreement was similar (accuracy 71.4%; kappa = 0.59). Across all features, disagreements were predominantly between adjacent severity categories. The AI was able to generate erythema severity grades for images of darker skin tones that dermatologists typically would not rate and were marked as "unable to assess". Limitations: The validation set was small (41 images), severe cases (score 3) were underrepresented, one rater participated in both training annotation and validation scoring, and sample size was insufficient for robust stratification by skin tone or body site. Conclusion: The AI pipeline demonstrated dermatologist-level accuracy for erythema scoring, consistent moderate agreement for excoriation and lichenification, and a potential advantage in assessing erythema on darker skin tones. These findings support its potential as a standardized, objective tool for AD severity assessment. Prospective validation in larger, more diverse cohorts is warranted.

9

Efficacy, safety and dose response of STS01, a topical controlled release nanoparticle formulation (dithranol/Prosilic), in adults with mild to moderate patchy alopecia areata: A randomised, double-blind, multicentre, phase 2 trial

Fleet, D.; Messenger, A.; Bryden, A.; Harris, M. J.; Holmes, S.; Farrant, P.; Leaker, B.; Takwale, A.; Oakford, M.; Kaur, M.; Mowbray, M.; MacBeth, A.; Gangwani, P.; Gkini, M. A.; Jolliffe, V.

2026-04-04 dermatology 10.64898/2026.04.02.26349934 medRxiv

Top 0.1%

14.4%

Show abstract

Background There are no licensed treatments for patients with mild to moderate patchy alopecia areata (AA). Objectives To evaluate the efficacy, safety and dose response of STS01, a novel nanoparticle controlled release, topical formulation of dithranol/Prosilic. Methods In a phase 2, double blind study, adult patients with mild to moderate AA (guideline 10% to 50% of scalp hair loss) were randomly assigned to STS01 at doses of 0.25%, 0.5%, 1%, 2% or placebo, daily for 6 months. The primary endpoints included the proportion of patients achieving a >=30% improvement in Severity of Alopecia Tool (SALT) score, and percentage change from baseline in SALT score. This minimum level of improvement is generally accepted as an indicator of the population likely to progress to complete regrowth Results A total of 155 patients were randomized and treated (placebo, n=32; STS01 groups, n=30 to 31). STS01 1% met the primary efficacy endpoint of >=30% SALT score improvement compared to placebo: 75.9% (95% CI, 60.3 to 91.4%) vs 36.7% (95% CI, 19.4 to 53.9%) at 6 months; p=0.0037. The least squares (LS) mean percentage change in SALT score from baseline to end of treatment showed a clear dose response relationship; STS01 0.5% was the minimally effective dose and 2% the maximum tolerated dose, and there was a statistically significant improvement in the STS01 1% group (minus 55.0% vs +0.6% with placebo; p<0.01). Significant improvements (p<0.05) in LS mean percentage changes from baseline in SALT scores were demonstrated in the STS01 1% group at 2 months (-28.6% vs 12.8%), 4 months (-57.2% vs 1.5%), and 6 months (minus 67.0% vs 0.6%). Clinical Global Impression improvement was reported in 72.0% of patients with STS01 1% vs 41.7% with placebo (p<0.05). The most commonly reported treatment emergent adverse events were skin irritation reactions, but were mostly mild (STS01: 56.7% to 71.0%; placebo: 21.9%) or moderate (STS01:13.3% to 35.5%; placebo: 0%) and manageable by reduced frequency of application. There were 15 skin-related discontinuations with STS01 (12.2%) and 2 (6.3%) with placebo. Conclusions STS01 demonstrated a clear dose response, with STS01 1% dose optimally more effective than placebo for hair regrowth with minimal tolerance concerns in mild to moderate patchy AA. Skin irritation reactions were generally manageable and there were no new safety signals. Further characterisation of the STS01 1% dose is planned in a phase 3 study. Chief Investigator AGM reports fees from Soterios Ltd. Chief Statistician DMF is an employee of Soterios Ltd. All other authors were Principal Investigators in the trial and their clinics were reimbursed for the work involved. Most also had sponsorship in the form of consultancies, investigational roles or lecturing roles on behalf of other Dermatological pharmaceutical companies

10

Missed Appointments and Associations with Clinical Outcomes in A Large National Healthcare System

Yin, Y.; Cheng, Y.; Ling, Y.; Ruser, C.; Altalib, H. H.; Masheb, R. M.; Kravetz, J.; Nelson, S. J.; Ahmed, A.; Faselis, C.; Brandt, C. A.; Zeng-Treitler, Q.

2026-03-30 health systems and quality improvement 10.64898/2026.03.28.26349531 medRxiv

Top 0.1%

12.5%

Show abstract

Importance Missed outpatient appointments, including no-shows and cancellations, may disrupt continuity of care and be associated with worse outcomes, but long-term system-wide patterns and clinical implications are not well characterized. Objective To characterize variation in missed appointment rates in the Veterans Health Administration (VHA) over time and by geographic location, visit modality, and preexisting conditions, and to evaluate associations between missed appointment rates and adverse outcomes among veterans with posttraumatic stress disorder (PTSD) or traumatic brain injury (TBI). Design Cohort study using VHA Corporate Data Warehouse outpatient appointment data from January 1, 2000, through December 31, 2024. Setting National integrated health care system of the VHA. Participants System analysis includes all scheduled outpatient appointments with a valid status, and outcome analysis includes veterans with PTSD (n = 1 429 890) or TBI (n = 554 553), diagnosed before 2023. Exposures For system -level analyses, missed appointment rates were calculated. In outcome analyses, 2023 missed appointment rates were categorized into tertiles within the cohort and appointment type. Main Outcomes and Measures One year risks of all-cause hospitalization, all-cause mortality, and hospitalization or death beginning January 1, 2024. Results Among 2,162,520,880 outpatient appointments from 2000 to 2024, 6.5% were no-shows and 25.4% were canceled. Across facilities, no-show rates ranged from 3.5% to 14.1%, patient-initiated cancellation rates from 9.7% to 26.0%, and clinic-initiated cancellation rates from 8.5% to 17.9%. In 2023, veterans with amputation, Parkinson disease, PTSD, or TBI had higher missed appointment rates than veterans without these conditions. Among veterans with PTSD, the highest no-show tertile, compared with none, was associated with higher mortality (HR, 1.91; 95% CI, 1.84-1.98) and hospitalization or death (HR, 1.07; 95% CI, 1.05-1.08). Among veterans with TBI, the highest no-show tertile was associated with hospitalization or death (HR, 1.65; 95% CI, 1.61-1.69). Conclusions and Relevance Missed outpatient appointments were common in the VHA and varied substantially across facilities and over time. Among veterans with PTSD or TBI, higher missed appointment rates, particularly no-shows, were associated with increased risks of hospitalization and mortality, suggesting that these patterns may help identify high-risk veterans for targeted outreach.

11

Assessing potential harms from screening overdiagnosis and false positives with multicancer early detection tests

Malagon, T.; Russell, W. A.; Burnier, J. V.; Dickinson, K.; Brenner, D.

2026-04-13 oncology 10.64898/2026.04.09.26348927 medRxiv

Top 0.1%

10.6%

Show abstract

BackgroundMulticancer early detection tests could be used for cancer screening, but may lead to harms, including false positive results and overdiagnosis of indolent tumours that would not have become clinically evident during that persons lifetime. We assessed the potential for these screening harms in the context of future population-based screening with a multicancer early detection test. MethodsWe used a microsimulation model to assess potential population-level impacts of screening at ages 50-75 years with a multicancer early detection test in Canada. We assumed high test specificity (97-99.1%) and test sensitivity increasing with cancer stage. The model includes latent indolent cancers that would not be diagnosed within that persons lifetime but can be overdiagnosed through screen-detection. We calculated the yearly and cumulative lifetime probabilities of screening overdiagnosis and false positive test results, assuming a range of preclinical screen-detectable periods (2-5 years). ResultsAn estimated 2.1-6.0% of all yearly screen-detected cancers with a multicancer screening test were predicted to be overdiagnoses across scenarios. The proportion of overdiagnosis varied by site, and strongly increased with age, going from 1% at age 50 to over 10% of screen-detected cancers by age 75. The test positive predictive value ranged from 15.9%-77.6%, meaning that there could be 0.3-5.3 false positives with no underlying cancer for every true cancer case detected by the test. ConclusionPopulation-level multicancer screening with a multicancer early detection test would likely not lead to substantial screen-related overdiagnosis. Healthcare systems should consider how screening false positives may increase their diagnostic service caseload.

12

Heat Exposure, Occupational Injury Risk, and Economic Costs in New York State

Laskaris, Z.; Baron, S.; Markowitz, S. B.

2026-04-22 occupational and environmental health 10.64898/2026.04.20.26351297 medRxiv

Top 0.1%

10.5%

Show abstract

ObjectivesRising temperatures are a major climate-related hazard for U.S. workers, increasing heat-related illness and a broad range of occupational injuries through indirect pathways often overlooked in economic evaluations. We examined the association between temperature and occupational injury and illness and quantified heat-attributable injuries (including illnesses) and costs in New York State. MethodsWe conducted a time-stratified case-crossover study of 591,257 workers compensation (WC) claims during the warm season (2016-2024). Daily maximum temperature was linked to injury date and county and modeled using natural cubic splines, with effect modification by industry and worker characteristics. ResultsInjury risk increased with temperature, becoming statistically significant at approximately 78{degrees}F. Relative to 65{degrees}F, injury odds increased to 1.06 (95% CI: 1.01-1.10) at 80{degrees}F, 1.12 (1.07-1.18) at 90{degrees}F, and 1.17 (1.11-1.23) at 95{degrees}F. Overall, 5.0% of claims (2,322 annually) were attributable to heat. At temperatures [≥]80{degrees}F, an estimated 1,729 excess injuries occurred annually, generating approximately $46 million in WC costs. An estimated $3.2 million to $36.1 million in medical expenditures were associated with incomplete claims, likely borne outside the WC system. ConclusionsThese findings demonstrate substantial economic costs not fully captured within WC and support workplace heat protections as a cost-containment strategy that can reduce health care spending and strengthen workforce resilience.

13

PRAM: Post-hoc Retrieval Augmentation for Parameter-Free Domain Adaptation of ICU Clinical Prediction Models

Jeong, I.; Lee, T.; Kim, B.; Park, J.-H.; Kim, Y.; Lee, H.

2026-04-05 health systems and quality improvement 10.64898/2026.04.03.26350132 medRxiv

Top 0.1%

10.2%

Show abstract

Background Clinical prediction models degrade when deployed across hospitals, yet retraining requires technical expertise, labeled data, and regulatory re-approval. We investigated whether post-hoc retrieval augmentation of a frozen model's output, analogous to retrieval-augmented methods in natural language processing, can mitigate this degradation without any parameter modification. Methods We developed the Post-hoc Retrieval Augmentation Module (PRAM), which combines predictions from a frozen base model with outcome information retrieved from similar patients in a local patient bank. Five base models (logistic regression through CatBoost) and three retrieval strategies were evaluated on 116,010 ICU patients across three databases (MIMIC-IV, MIMIC-III, eICU-CRD) for acute kidney injury (AKI) and mortality prediction. A bank size deployment simulation modeled performance from zero to full local data accumulation, complemented by source bank cold start, stress tests, and calibration experiments. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Results Retrieval benefit was inversely associated with base model complexity ({rho} = -0.90 for AKI, -1.00 for mortality): simpler models benefited more, consistent with retrieval capturing residual signal unexploited by the base model. PRAM showed a statistically significant monotone dose-response between bank size and prediction performance across all six outcome-target combinations (Kendall {tau} trend test, q = 0.031 for all). At the pre-specified primary comparison (bank = 5,000), the improvement was confirmed for the two largest-shift settings (eICU-CRD AKI: {Delta}AUROC = +0.012, q < 0.001; eICU-CRD mortality: {Delta}AUROC = +0.026, q < 0.001). Pre-loading a source bank bridged the cold-start gap, providing an immediate performance gain equivalent to approximately 2,000 to 5,000 local patients. Conclusions PRAM provides a parameter-free adaptation mechanism that requires no model retraining, gradient computation, or regulatory re-evaluation at the deployment site. Effect sizes were modest and did not reach cross-model superiority, but the consistent dose-response pattern and the absence of retraining requirements establish retrieval-based adaptation as a viable approach for clinical model transportability. The retrieval mechanism additionally opens a pathway toward case-based interpretability, where predictions are accompanied by identifiable similar patients from the deploying institution.

14

Safety and immunogenicity of an HIV envelope trimer immunogen that elicits CD4 binding site neutralizing antibody precursors (HVTN 300)

Walsh, S.; Hahn, W. O.; Williams, W. B.; Hyrien, O.; Yu, P.-C.; Parks, K. R.; Edwards, R. J.; Parks, R.; Barr, M.; Polakowski, L. L.; Tindale, I.; Jones, M.; Yurdadon, C.; Burnham, R.; Yeh, C.-H.; Heptinstall, J.; Seaton, K.; Andriesen, J.; Sagawa, Z.; Miner, M. D.; De Rosa, S.; McElrath, M. J.; Corey, L.; Tomaras, G. D.; Montefiori, D. C.; Haynes, B. F.; Mayer, K. H.; Baden, L. R.

2026-04-03 hiv aids 10.64898/2026.03.31.26349761 medRxiv

Top 0.2%

8.3%

Show abstract

Background: Induction of HIV envelope (Env)-specific broadly neutralizing antibodies (bnAbs) is considered a key objective for HIV-1 vaccine development. One approach is to vaccinate with HIV Env immunogens that initially target the naive B cell receptors of a bnAb type and boost with a series of HIV Env variants. We chose a priming immunogen, the CH505 transmitted/founder Env with high affinity for the naive B cell receptor of the prototype CD4 binding site (bs) bnAb lineage, CH103, as a candidate priming immunogen to induce the initial critical step in CD4bs bnAb development. Methods: HVTN 300 is a first-in-human, open-label Phase 1 study evaluating the safety and immunogenicity of a CH505 TF chimeric (ch) Trimer adjuvanted with 3M-052-AF (a TLR7/8 agonist) + Alum. The immunogen is a recombinant, stabilized chimeric Env trimer protein with the N-terminal sequence of CH505 TF gp120 Env transplanted into the BG505 SOSIP sequence. Participants received the adjuvanted protein administered in both deltoid muscles at months 0, 2, 4, 8, and 12. Results: Adults (n=18) aged 18 to 55 were screened at a single site in Boston, USA, and 13 were enrolled. Local and systemic reactogenicity was typically mild to moderate. One participant had severe pain/tenderness, and five participants reported transient severe systemic symptoms at least once. Five participants chose to stop further vaccination due to reactogenicity. No vaccine-related SAEs occurred. Vaccine-specific B-cell response rates reached 100% two weeks post third and fifth vaccinations. Antibody blocking experiments with monoclonal antibodies demonstrated that most participants had antibodies directed to the CD4bs. Four out of 11 participants had serum neutralization signatures for CD4bs bnAb precursors. Conclusions: No safety concerns were identified. The adjuvanted CH505 TF chTrimer elicited serum antibodies capable of CD4bs-mediated neutralization against strains designed to detect early precursors of the CD4bs B-cell lineages. Trial Registration: NCT04915768 Disclosure: Presented in part at HIVR4P 2024, Lima, Peru, October 6-10, 2024

15

Multi-Ancestry Survival GWAS of Substance Use Initiation in the ABCD Study

Wei, M.; Peng, Q.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.08.26350431 medRxiv

Top 0.3%

8.2%

Show abstract

BackgroundSubstance use initiation in adolescence is influenced by both genetic and environmental factors; however, large-scale genetic studies often treat initiation as a binary outcome and underuse longitudinal timing information. MethodsWe conducted time-to-event (survival) genome-wide association analyses (GWAS) of initiation for four outcomes--alcohol, nicotine, cannabis, and any substance use--using longitudinal follow-up data from the Adolescent Brain Cognitive Development (ABCD) Study. We performed ancestry-stratified GWAS within European (EUR), African (AFR), and Hispanic (HISP) groups, applying consistent quality control and covariate adjustment. Summary statistics were harmonized across ancestries and meta-analyzed using inverse-variance weighted fixed-effects and DerSimonian-Laird random-effects models. We evaluated genomic inflation and heterogeneity (Cochrans Q and I2), identified independent lead variants at genome-wide and suggestive significance thresholds, and assessed cross-trait overlap of associated loci. ResultsIn the multi-ancestry meta-analysis, we observed suggestive association signals across traits (minimum p-values: alcohol [~] 1 x 10-7, any [~] 1 x 10-7, cannabis [~] 5 x 10-8, nicotine [~] 1 x 10-8). Nicotine initiation showed one genome-wide significant variant in both fixed- and random-effects meta-analyses (p < 5 x 10-8). Across traits, suggestive loci demonstrated limited overlap, with the strongest concordance between alcohol and any substance use, consistent with shared liability. Heterogeneity statistics indicated that some loci exhibited cross-ancestry variation in effect estimates. ConclusionsSurvival GWAS leveraging initiation timing can identify genetic signals that may be missed by binary designs and enables principled multi-ancestry synthesis. Our results highlight both shared and trait-specific genetic contributions to early substance initiation and provide a foundation for downstream functional annotation and integrative modeling with environmental risk factors. These findings demonstrate the value of incorporating developmental timing into genetic discovery and provide a framework for integrating longitudinal risk modeling with genomic analyses.

16

Advancing Hair Loss Assessment in Alopecia Areata: The Mathematical Case for Centralised, Standardised Imaging

Fleet, D. M.; Messenger, A.; Bryden, A.; Harris, M. j.; Holmes, S.; Farrant, P.; Leaker, B.; Takwale, A.; Oakford, M.; Kaur, M.; Mowbray, M.; Macbeth, A.; Gangwani, P.; Gkini, M. a.; Jolliffe, V.

2026-04-04 dermatology 10.64898/2026.04.02.26349939 medRxiv

Top 0.3%

7.0%

Show abstract

Background In clinical trials for alopecia areata (AA) the treatment effect (percentage of hair loss) is estimated using the Severity of Alopecia Tool (SALT) score. Trials in patients with severe AA (>=50% hair loss) employed a local rating of the SALT score performed at trial sites by different investigators. However, in mild-to-moderate AA (<= 50% hair loss) where SALT scores are lower, potential inter rater variability and margin of error may compromise the results. Objectives To compare Centralised and Local measurement of hair loss in mild moderate AA. Methods In a Phase 2 clinical trial a centralised measurement of hair loss was performed from photographic images taken using a standardised protocol and professional camera equipment. Local scoring was also undertaken at screening/baseline for eligibility. We assessed: the repeatability of the central system (screening vs baseline values), the reproducibility of the central versus the local rating system and the potential impact of each method on the endpoints using a Monte-Carlo simulation method. Results There was good agreement and consistency of scoring with Central rating. This provided much smaller margins of error, 50% lower than Local rating. The simulations demonstrated that substituting Local rating for Central rating would result in a reduction of the likelihood of a statistically significant outcome by at least 50% depending on the SALT score defined clinical response endpoint. Conclusions Central rating is most appropriate in the Phase 2 learning stage of clinical development and provides an accurate representation of the quantity of hair loss, minimising error and ensuring consistency in measurements.

17

Prospective Population-Scale Validation of an Electronic Health Record Based Model for Pancreatic Cancer Risk

Lahtinen, E.; Schigiltchoff, N.; Jia, K.; Kundrot, S.; Palchuk, M. B.; Warnick, J.; Chan, L.; Shigiltchoff, N.; Sawhney, M. S.; Rinard, M.; Appelbaum, L.

2026-04-13 oncology 10.64898/2026.04.11.26350318 medRxiv

Top 0.3%

6.9%

Show abstract

Background and aims: Pancreatic ductal adenocarcinoma (PDAC) surveillance is limited to individuals with familial or genetic risk although most future cases arise outside these groups. In a retrospective study, PRISM, an electronic health record (EHR)-based PDAC risk model, identified individuals in the general population at elevated near-term risk of PDAC. We aimed to prospectively evaluate whether PRISM can identify high-risk individuals beyond current surveillance groups across U.S. health systems. Methods: We performed a prospective multicenter cohort study after deployment of PRISM in April 2023 across 44 U.S. health care organizations. Eligible adults aged [≥]40 years without prior PDAC received a single baseline risk score and were assigned to prespecified risk tiers. Patients were followed for incident PDAC for 30 months. We estimated tier-specific 30-month cumulative incidence (positive predictive value, PPV), number needed to screen (NNS), standardized incidence ratios (SIRs), and time from deployment and first high-risk flag to diagnosis. Results: Among 6,282,123 adults assigned a PRISM score, 5,058,067 had follow-up; 3,609 developed PDAC. The highest-risk tier had 30-fold higher PDAC incidence than the study population. At the SIR 5 threshold, 30-month cumulative incidence was 0.35% (NNS, 284.2); at SIR 16, 1.14% (NNS, 87.4); and at SIR 30, 2.19% (NNS, 45.7). Median time from deployment to PDAC diagnosis was 9.5 months, and median time from first high-risk flag to diagnosis at SIR 5 was 3.5 years. Shapley additive explanations (SHAP) analyses supported patient- and tier-level interpretability. Conclusions: Prospective deployment of PRISM across multiple U.S. health care organizations identified individuals at elevated near-term risk for PDAC, with substantial risk enrichment and lead time before diagnosis. These findings support the real-world scalability and generalizability of EHRbased risk stratification for risk-adapted early detection. ClinicalTrials.gov identifier NCT05973331

18

Longitudinal Trajectories of Child and Youth Mental Health Symptoms Across Distinct Phases of the COVID-19 Pandemic: A population-based study in Ontario, Canada

Georgiades, K.; Chen, Y.-J.; Johnson, D.; Miller, R.; Wang, L.; Sim, A.; Nolan, E.; Dryburgh, N.; Edwards, J.; O'byrne, S.; Repchuck, R.; Cost, K. T.; Duncan, L.; Golberg, M.; Duku, E.; Szatmari, P.; Georgiades, S.; MacMillan, H. L.; Waddell, C.

2026-04-04 psychiatry and clinical psychology 10.64898/2026.04.02.26350051 medRxiv

Top 0.3%

6.9%

Show abstract

Background Although an expansive body of evidence exists on children's mental health during the COVID-19 pandemic, it is largely restricted to the early phases and lockdowns. This study examines longitudinal changes in child and youth mental health symptoms across two years of the COVID-19 pandemic, with data collection strategically timed to capture variability in pandemic restrictions. Methods A population-based longitudinal study of 1,261 children and youth aged 4-17 years followed prospectively from January 2021 to December 2022, with five waves of data collected in Ontario, Canada. Latent growth curve modelling was used to estimate trajectories of parent-reported mental health symptoms and identify baseline and time-varying covariates associated with variable trajectories. Findings Mental health symptoms were elevated and stable during lockdowns, followed by significant reductions as pandemic restrictions loosened, particularly for oppositional defiant and inattention/hyperactivity symptoms compared to internalizing symptoms. Children without pre-existing clinician diagnosed physical, mental or neurodevelopmental conditions and those not in lockdown at baseline demonstrated relative increases in mental health symptoms during lockdowns; and girls, compared to boys, demonstrated smaller reductions in internalizing symptoms as restrictions loosened. Concurrent and lagged associations between parental distress and children's mental health symptoms varied across the pandemic. Interpretation Variation in symptom trajectories by mental health domain, gender, pandemic restrictions and pre-existing diagnosed conditions underscores the need for tailored, equity-informed pandemic planning and response. Policies designed to optimize the balance between the need to reduce viral community transmission whilst limiting pandemic lockdowns may mitigate adverse impacts on child and youth mental health. Funding Ontario Ministry of Health

19

Semaglutide Initiation and Treatment Duration On Suicidality Risk in US Veterans With Type 2 Diabetes

Maldonado, A.; Heberer, K.; Lynch, J.; Cogill, S. B.; Nallamshetty, S.; Chen, Y.; Shih, M.-C.; Bress, A. P.; Lee, J.

2026-04-20 psychiatry and clinical psychology 10.64898/2026.04.17.26351118 medRxiv

Top 0.3%

6.7%

Show abstract

ImportanceSemaglutide, a glucagon-like peptide-1 receptor agonist (GLP-1RA), is a highly effective medication to treat type 2 diabetes and obesity. However, concerns about potential suicidality persist, creating clinical uncertainty about its neuropsychiatric safety. ObjectiveTo assess risks of suicidality after initiating semaglutide compared to initiating SGLT2i and by duration of continuous semaglutide treatment. DesignActive-comparator, new-user target trial emulation to estimate inverse probability-weighted marginal cause-specific hazard ratios (HRs). For duration-of-treatment analyses, we used clone-censor-weight methods to estimate exposure-adjusted effects. SettingVeterans Health Administration. ParticipantsU.S. Veterans with type 2 diabetes receiving care from March 1, 2018 to September 1, 2025. ExposureInitiation of semaglutide vs SGLT2i; duration of semaglutide use ([≤]6, 7-12, >12 months). OutcomesIncident suicidal ideation; suicide attempt or death; and a composite outcome. ResultsA total of 102,361 Veterans met inclusion criteria, including 11,478 new initiators of semaglutide and 90,883 new initiators of an SGLT2i. After overlap weighting, baseline characteristics were well balanced between treatment groups (mean [SD] age, 60.1 [11.7] years; BMI, 37.8 [6.8] kg/m2; hemoglobin A1c, 7.0% [1.4]; 85.5% male; 61.9% non-Hispanic White). During a median follow-up of 2.2 years, 9077 incident suicidal ideation events and 696 suicide attempts or deaths occurred. The incidence rate of suicidal ideation was 56.3 and 37.7 per 1000 person-years among semaglutide initiators and SGLT2i initiators, respectively (hazard ratio [HR], 0.99; 95% CI, 0.93-1.06; P = 0.86). For suicide attempts or deaths, the incidence rates were 4.30 and 2.64 per 1000 person-years, respectively (HR, 1.05; 95% CI, 0.84-1.31; P = .86). In adherence-adjusted analyses, sustained semaglutide treatment for more than 12 months, compared with 6 or fewer months, was associated with a 74% lower risk of suicide attempts or deaths (HR, 0.27; 95% CI, 0.14-0.54; P<.001). ConclusionAmong U.S. Veterans with type 2 diabetes, initiators of semaglutide were not observed to have an increased risk of suicidality compared with initiators of SGLT2i. Those with longer semaglutide treatment (beyond 12 months) had decreased risk of suicide attempt or death, suggesting longer term treatment is safe and may protect against for those outcomes.

20

Cannabis, ENDS, and Tobacco Co-use and Co-exposures Among ECHO Adolescents and Emerging Adults

Appleseth, H.; Felt, J.; Cohn, A. M.; Schmidt, R. J.; Croff, J. M.; Leffingwell, T. R.

2026-04-06 public and global health 10.64898/2026.04.03.26350120 medRxiv

Top 0.4%

6.4%

Show abstract

Importance: Understanding patterns of substance use and environmental exposures to tobacco, cannabis, and electronic nicotine delivery systems (ENDS) among youth is critical for developing targeted prevention strategies, particularly as co-use of tobacco, ENDS, and cannabis becomes more prevalent. Objective: To identify latent classes of tobacco, ENDS, and cannabis use, and environmental exposures to these products among adolescents and emerging adults. Design, Setting, and Participants: Data from the Environmental influences on Child Health Outcomes (ECHO) consortium (3rd data release, 2018 to 2022) were analyzed from March 2025 to January 2026. The sample (N=2,786) included early adolescents (ages 11 to 13; n=226, 7.9%), middle adolescents (ages 14 to 17; n=1,248, 43.4%), and late adolescents/emerging adults (ages 18 to 24; n=1,402, 48.7%) from 19 ECHO cohorts. Main Outcomes and Measures: The Youth Risk Behavior Survey, Substance Use module measured experimental and current use of cannabis, ENDS, and tobacco products, as well as daily environmental exposure to tobacco smoke, nicotine aerosols, and cannabis smoke within home and social contexts. A multiple group latent class analysis was used to identify distinct latent classes of substance use and environmental exposure to tobacco smoke, nicotine aerosols, and cannabis smoke and compared class prevalences across early, middle, and late adolescence. Results: Four latent classes were identified, including: No Use/No Exposure (53%), No Use, Polyexposure (10%), Experimental Use/Low Exposure (22%), and Polysubstance Use/High Polyexposure (14%). Cannabis was the most used substance (34% experimental or current use) and the most common source of environmental exposure (20%), followed by ENDS use (26% experimental or current use; 19% environmental exposure) and combustible tobacco (15% use; 19% environmental exposure). The No Use/No Exposure and No Use/Polyexposure classes were primarily made up of early and middle-aged adolescents, whereas the Experimental Use/Low Exposure and Polysubstance Use/High Polyexposure classes primarily consisted of late adolescents and emerging adults. Conclusions: Our study revealed distinct, developmentally patterned groupings of substance use and environmental exposure among US adolescents and emerging adults, highlighting the need for developmentally tailored interventions, messaging, and policies that address both active use and environmental exposure across adolescence.